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Abstract —With the rapid development of software defined 
networking and network function virtualization, researchers have 
proposed a new cloud networking model called Network-as-a- 
Service (NaaS) which enables both in-network packet processing 
and application-specific network control. In this paper, we revisit 
the problem of achieving network energy efficiency in data 
centers and identify some new optimization challenges under the 
NaaS model. Particularly, we extend the energy-efficient routing 
optimization from single-resource to multi-resource settings. We 
characterize the problem through a detailed model and provide 
a formal problem definition. Due to the high complexity of direct 
solutions, we propose a greedy routing scheme to approximate 
the optimum, where fiows are selected progressively to exhaust 
residual capacities of active nodes, and routing paths are assigned 
based on the distributions of both node residual capacities and 
fiow demands. By leveraging the structural regularity of data 
center networks, we also provide a fast topology-aware heuristic 
method based on hierarchically solving a series of vector bin 
packing instances. Our simulations show that the proposed 
routing scheme can achieve significant gain on energy savings 
and the topology-aware heuristic can produce comparably good 
results while reducing the computation time to a large extent. 

1. Introduction 

With the widespread adoption of cloud computing, enor¬ 
mous large-scale data centers have been deployed for compa¬ 
nies like Google, Microsoft, and Amazon, to provide online 
services including searching and social networking. Generally 
speaking, data centers are consolidated facilities holding tens 
of thousands servers that are connected by a well-structured 
network termed data center network (DCN). Despite some 
designs that rely on specialized hardware and communication 
protocols, most of the DCN architectures leverage commodity 
Ethernet switches and routers to interconnect servers, and thus 
are compatible with TCP/IP applications. 

As inter-node communication bandwidth is the principal 
bottleneck in data centers, there has been a large body of 
work on optimizing the performance of DCNs (e.g., El, (Jl). 
However, in order to apply these proposals to production 
DCNs, a lot of effort has to be undertaken, including both 
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Figure 1. An overview of a general NaaS implementation in a typical cloud 
data center. The control plane is separated from the data plane by the SDN 
while the data plane can be customized through NFV. 

hardware- and software-end modifications. This is due to the 
specific deployment settings designed for the routing and 
forwarding protocols used in current data centers. As a result, 
incremental implementations are usually not achievable and 
significant effort has to be made for every single design. 

The situation has been changed with the evolution of 
network architecture. On the one hand, researchers proposed 
to decouple control plane from data plane, which enables 
rapid innovation in network control. This idea then naturally 
led to the advent of Software Defined Networking (SDN). 
Instead of having all network nodes to run the routing protocol 
in a distributed manner, SDN abstracts the network control 
functionality to a logically centralized controller. The routing 
decisions are then made by the controller and will be pushed 
to network nodes through a well-defined Application Program¬ 
ming Interface (API) such as OpenElow m. As a result, the 
optimization work can be totally implemented in the controller, 
which needs only very basic software modification in the event 
of network changes. On the other hand, the innovation in 
data plane has also been sped up by the technology called 
Network Eunction Virtualization (NEV) where packets are 
handled by software-based entities on general-purpose servers 
with network functions virtualized. 

Taking advantage of the advancement of both control plane 
and data plane in networks, a new cloud networking model 
called Network-as-a-Service (NaaS) has been recently pro¬ 
posed (H, Q. An overview of a general NaaS implementation 
can be found in Eig. In the NaaS model, packet forwarding 
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decisions are implemented based on specific application needs. 
Moreover, NaaS enables the design of in-network packet modi¬ 
fication and thus in-network services, such as data aggregation, 
stream processing or caching can be specified by upper-layer 
applications. Based on this new networking model, several 
working examples have been studied, including in-network 
aggregation for big data applications ||6l. 

Although there is still a range of research challenges for 
a widespread adoption of NaaS, we observe that some fun¬ 
damental problems have emerged from this new model. It is 
recognized that no matter what networking model is employed, 
problems such as load balancing and energy saving always 
possess their importance. However, compared to the traditional 
packet-forwarding-centric model, NaaS brings new challenges 
to these problems by allowing in-network packet operations, 
making existing solutions not efficient or even not applicable 
to these problems any more. 

Particularly, we revisit the problem of achieving network 
energy efficiency in data centers and identify some new chal¬ 
lenges under the NaaS model. With packets being processed 
by general-purpose servers, energy-related issues become more 
prominent. The energy saving problem in DCNs has been 
widely studied and most proposals are based on the general 
approach of consolidating network fiows and turning off 
unused network elements (e.g., Cl, El, O, Cni). In packet 
forwarding networks, link utilization is the most important 
criterion for fiow consolidation. However, this is no longer 
valid under the NaaS model, where a network node can be 
congested not only by data communications, but also by the 
overloading of other resources such as processing units or 
memory. Without considering other resources, a link utilization 
oriented consolidation of fiows may lead to very bad resource 
saturation at some network nodes and to serious network 
instability. Therefore, it is essential to take into account 
multiple resources when making routing decisions under the 
NaaS model. To the best of our knowledge, this is the first 
research attempt towards multi-resource traffic engineering. 

The main contributions of this paper are as follows: i) we 
identify new research challenges for conventional optimization 
problems under the NaaS model, and characterize the network 
energy saving problem through a detailed model with com¬ 
plexity analyzed; ii) we propose a greedy routing scheme 
where path selection is done based on the distributions of 
node residual capacities and fiow demands; in) by utilizing 
the structural property of DCNs, we provide a topology-aware 
heuristic which can accelerate problem solving while produc¬ 
ing comparably good results; iv) we validate the efficiency of 
the proposed algorithms by extensive simulations and show 
that significant energy efficiency gain in NaaS-enabled DCNs 
can be achieved by the techniques proposed in this paper. 

The remainder of this paper is organized as follows. Sec¬ 
tion summarizes the related work. Section III gives the 
model and the definition of the problem, as well as some 
complexity analysis. Section |IV] proposes a greedy routing 
scheme and Section |V] presents a topology-aware heuristic. 
Section |Vl| examines the performance of the proposed algo¬ 


rithms by simulations. Section VIII concludes the paper. 


H. Related Work 

In this section, we revise from several viewpoints the 
existing work related to our study. 

Software defined networking. The high-level coupling of 
the control plane and the data plane in traditional networks 
brings very high complexity to network management and leads 
to a very slow pace of development and evolution of network 
functionalities due to the reliance on proprietary hardware. 
SDN is forced to solve these problems by changing the design 
and management of a network in the following two ways: 
in an SDN, the control plane and the data plane are clearly 
separated; the control plane is logically consolidated. The 
control plane thus exercises a full view of the network and 
can be implemented with a single software control program. 
Through a well-defined API, the control plane carries out 
direct control to push decisions over multiple data-plane 
elements in the network. The logically centralized control 
can facilitate most network applications including network 
virtualization ifTTI . server load balancing (TJl, and energy- 
efficient networking 171, ifTOl . which would require enormous 
efforts to be implemented in a totally distributed environment. 

Software packet processing. Recently, researchers have 
argued for building evolvable networks, whose functionality 
changes with the needs of its users and is not tied to par¬ 
ticular hardware vendors ca. This is called general-purpose 
networking, where a network-programming framework is run 
on top of commodity general-purpose hardware. On the one 
hand, there have been several research prototypes demon¬ 
strating that general-purpose hardware is capable of high- 
performance packet processing when packets are subjected to 
single particular type of processing, such as IP forwarding 
ifTSl or cryptographic operations ifTH . It has also been shown 
in 03 that such a software packet processing platform can 
achieve predictable performance while running various packet¬ 
processing applications and serving multiple clients with dif¬ 
ferent needs. On the other hand, Niccolini et al ca developed 
software mechanisms that exploit the underlying hardware’s 
power management features for more energy-efficient packet 
processing in software routers. 

Network-as-a-Service. With the development of SDN and 
software packet processing, a new networking model called 
NaaS has been recently proposed H, E)- Under this model, 
the network is conducted based on an SDN implementation 
where a centralized controller takes charge of fiow man¬ 
agement, including routing paths assignment and network 
functions interposition. The network is comprised of general- 
purpose servers with multiple network ports connected by 
high-speed links. In each node in the network, network func¬ 
tions for packet processing are virtualized and can be invoked 
through the controller by upper-layer applications according to 
their needs. Several research attempts have already been made 
to adopt NaaS in real date centers El. 

Energy-efficient DCN. The topic of achieving energy- 
efficient data center networks has been extensively studied. Re- 
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search efforts are concentrated on the following two categories. 
One is designing new architectures with less network devices 
involved while providing similar end-to-end connectivity lEl. 
The other is applying traffic engineering techniques to consol¬ 
idate network fiows and switch off unused network elements 
(switches or links). The seminal work in this category is the 
concept of ElasticTree proposed by Heller et al. id, which 
is a network-wide power manager that dynamically adjusts 
the set of active network devices to satisfy changing traffic 
loads in DCNs. Shang et al CD discussed how energy can 
be saved by energy-aware routing with negligible performance 
degradation in high-density DCNs. Some follow-up works 
include REsPoNse 0, CARPO 0 and GreenDCN CO). 

Multi-resource allocation. Resource allocation in comput¬ 
ing systems has been widely discussed under the limit of a 
single resource, such as CPU time and link bandwidth. While 
multi-resource allocation is considered in cloud computing 
systems, it is usually carried out with a slot-based single 
resource abstraction. Recently, researches have made some 
efforts (e.g., Oil, Gol) towards multi-resource fair sharing 
under the Dominant Resource Fairness (DRF) model proposed 
by Ghodsi et al 1211 . On the network side, DRF-based ap¬ 
proaches (e.g., 1^ . 1^ ) have also been proposed to achieve 
multi-resource fair queueing in software packet processors. 

Compared to these studies, our work possesses its unique¬ 
ness in the sense that we are the first to target the problem 
of achieving network-wide energy efficiency under multi¬ 
resource settings based on the NaaS model. 

III. Problem Statement 

In the server end, the problem of finding the minimum 
number of servers to accommodate a given set of tasks 
whose resource requirements are characterized by vectors is 
defined as the Vector Bin Packing (VBP) problem. The best 
known solution for the general form of the problem is a 
{\nK) -f 1 + e approximation for any e > 0, provided by 
Basal et al, l24l where K is the number of dimensions of each 
item. While the single-resource network energy optimization 
problem has been well-studied, very little attention has been 
received by the energy-efficient routing problem in networks 
with multiple resources. With an emerging trend of software 
packet processing in networks, this problem has raised its 
significance. In the following, we provide a formal modeling 
of the problem and examine its complexity. 

A. Preliminary Notations 

We abstract a given software packet-processing network as 
graph Q = {V,f}, where V is the set of N nodes, each 
of which represents a general-purpose server with software 
packet processing functionalities, and £ is the set of undirected 
edges representing the network links. Each node v e V has 
limited amounts of K different types of hardware resources, 
namely CPU, memory, and network bandwidth, to name a 
few. The total amount of typc-k resource is constrained by 
a positive capacity Cv,k {k G Due to the fact 

that packet-processing networks are usually constructed using 


commodity general-purpose servers, it is reasonable to assume 
that all the nodes in V are identical. Thus, for all G V, we 
assume Cy^k = Ck for all k G {1, 2,..., K}. 

We define Siflow as a sequence of data packets that possess 
the same entities in the packet headers such as the same source 
and destination IP addresses. Suppose we are given a set of 
M flow demands V = {di, d 2 ,dM}- The packets from 
the same flow d^ will be routed following a single path in 
order to avoid packet reordering at the destination. For all the 
packets from a given fiow, a processing procedure is defined on 
every node on the flow’s routing path, which is used to carry 
out some per-flow computation to the payloads of packets, 
e.g., intercepting packets on-path to implement opportunistic 
caching strategies 0. Due to the fact that the data carried by 
the packets from the same fiow generally possess the same 
structure (e.g., same packet size), we assume that (nearly) the 
same amount of computation will be applied to the packets 
from the same flow. As a result, we have to keep (almost) the 
same reservation across each type of resource on every node 
on the path for each flow. Each flow d^ is represented by 
a three-tuple Rm) where and are the source 

and destination respectively, while Rm is a AT-dimensional 
vector (r^,i,describing the amounts of resources 
in all types required (and reserved) for a node to process 
the packets from fiow d^. These resource demands can be 
obtained by applying the same technique used in 12^ . For the 
sake of simplicity and without loss of generality, we assume 
that the for m G {1,2,..., M} are normalized by Ck for 
any k G {1, 2,..., K}, i.e. Rm G [0,1]^. 

To quantify the performance of approximations, we term 7 
as the performance ratio of an algorithm for a minimization 
problem if the objective values in the solutions provided by 
the algorithm are upper-bounded by 7 times the optimal. 

B. Problem Formulation 

Using the introduced notations, the energy-efficient multi¬ 
resource routing problem can be formally defined as follows. 
For a vector x, we denote by ||x||oo the standard ^00 norm. 

Definition 1 (ENERGY-EFFICIENT MULTI-RESOURCE 
ROUTING (EEMR)). Given a network Q = (V,f) and a 
set of M flows di,...,dM whose demands are characterized 
by Ri, Rm from [0, 1]^, find a path Vm from to for 
each fiow dm such that ||A^||oo ^ 1 for v G V where Ay = 
'^m-vev aggregation of the resource requirement 

vectors of fiows that are routed through node v. The objective 
is to minimize \Q\ where Q = {v \ v G V A 7 ^ (0, ...,0)} 
is the set of nodes that are used to carry fiows. 

The EEMR problem can be formulated as a Mixed Integer 
Program (MIP) in the following way. We introduce two binary 
variables Xm,v and py. The binary variable Xm,v indicates 
whether flow d^ is routed through node v and py indicates 
whether node v is active or not. Our objective is to minimize 
the number of active nodes Note that we have an implicit 

^As the static power consumption of a node is dominant, we only consider 
using power-down based strategy as the main energy saving mechanism. 
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assumption that feasible solutions are always achievable, that 
is, the network with the designed capability is able to handle 
the given traffic demands. 

(Pi ) minimize E Vv 
vev 

subject to 

11 ^ ^ Rm ' ^m,v 11 CO E 1 U G V 

Xm,v ^ Vv ugV, l<m<M 

Xm,v,yv ^ {0,1} u G V, 1 < m < M 

Xm,v • flow conservation 

The constraints of program Pi are as follows: the first con¬ 
straint states that the fiows routed through the same node 
do not exceed the node resource dimensions; the second 
constraint tells whether a node is active or not; the third 
constraint ensures that each fiow can only follow a single 
path. Flow conservation on Xm,v forces that the nodes that 
fiow demand d^ is routed through form a path between 
and in the network. 

Note that when Ff = 1, Pi corresponds to the general ca¬ 
pacitated network design problem which has been widely stud¬ 
ied. For the uniform link capacitated version of the problem, 
Andrews, Antonakopoulos and Zhang 1^ provided a poly- 
logarithmic approximation when the capacity on each link is 
allowed to be exceeded by a polylogarithmic factor. Recently, 
ll^ explored the multicommodity node-capacitated network 
design problem and provided a 0(log^ n) -approximation with 
0(log^^ n) congestion. However, none of the studies can 
provide high-quality approximations with capacity constraints 
that are inviolable. This is mainly because with strict capacity 
constraints, finding out whether there is a feasible solution for 
the problem is already NP-hard. 

C. Complexity Analysis 

In contrast with the traditional energy-efficient routing (i.e., 
capacitated network design) problem, EEMR extends the con¬ 
cept of “load” from single-dimensional to multidimensional, 
which makes the problem even computationally harder. In 
general, we have the following complexity results. 

Theorem 1. Solving the EEMR problem is NP-hard. 

Proof. The proof is conducted on a polynomial-time reduction 
from the VBP problem which is known to be NP-hard. 
Assume we are given an arbitrary instance of VBP and now 
we reduce it to the EEMR problem in the following way: 
each bin in the VBP instance is a node in the network for 
EEMR and each node is connected with two extra nodes src 
and dst. Each item in the VBP instance represents a fiow 
which originates from src and ends at dst and has resource 
requirements characterized by the vector for the item. Then, 
straightforwardly, if we obtain an optimal solution for EEMR, 
this solution will correspond to an optimal solution to VBP 
with the same structure. As a result, any polynomial-time 
algorithm that optimally solves EEMR can also be used to 


solve VBP optimally, which contradicts with the fact that VBP 
is NP-hard. □ 

Theorem 2. There is no asymptotic PTAS for the EEMR 
problem unless P=NP. 

This is directly applied from the fact that VBP with K > 
2 is know to be APX-hard which implies that there is no 
asymptotic PTAS for it Gzl. Prom the above reduction we 
know that actually VBP is a special case for EEMR, meaning 
that EEMR has at least the same complexity as VBP. 

IV. Energy-efficient Multi-resource Routing 

The complexity analysis results show that the EEMR prob¬ 
lem is NP-hard, for which no existing exact solutions can scale 
to the size of current data center networks. Therefore, we resort 
to an intuitive approach that can provide suboptimal solutions 
very quickly. We detail our design in this section. 

A. Key Observations 

We propose a greedy routing scheme to solve the energy- 
efficient multi-resource routing problem. The general idea is to 
use as few as possible nodes to carry all the traffic fiows while 
maintaing the capacity constraints in all resource dimensions. 
More specifically, our design is based on the following two 
observations: i) fiows preferably follow paths that consists of 
more active nodes (that already carry some traffic) as this will 
introduce less extra energy consumption to the network; ii) it 
is important to allocate routes for fiows on the active nodes 
such that all dimensions of the resources in every active node 
can be fully utilized. 

The second observation is a new concern steaming from the 
multi-resource context. In the single-resource case, the only 
criterion for the efficiency of a node is its resource utilization, 
i.e., the carried traffic divided by the total capacity. As a result, 
steering fiows to those nodes with low utilizations will lead to 
an energy-efficient routing solution. However, this approach 
is not applicable to the multi-resource case. With multiple 
dimensions of resources, it is not clear how to define the 
resource utilization of a node, thus we will not be able to make 
routing decisions based on node utilizations. Eor instance, 
given two load vectors (0.6,0.4,0.1) and (0.4,0.4,0.3) of a 
node, and a fiow demand vector (0.1,0.3,0.4), the resulted 
loads of this node when routing the given fiow under the 
two different load levels are (0.7,0.7,0.5) and (0.5,0.7,0.7), 
which are not directly comparable. In order to step over this 
obstacle, we will provide a measuring method based on the 
distributions of node residual capacities and fiow demands. 

B. The Routing Scheme 

The pseudocode of the routing scheme is shown in Algo¬ 
rithm [2 The algorithm runs progressively. In each iteration, 
it first tries to use only the set of active nodes. By searching 
the fiow demand list, it tries to find out a candidate fiow to 
route on the subnetwork Qa composed by the active nodes and 
the corresponding network links connecting these nodes. Note 
that it is necessary to remove the nodes that are not capable of 
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Algorithm 1 Multi-Resource Green (MRG) routing 
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Va = 0 ; /^set of active nodes'll 
for each {v G V) = { 0 }^; /^residual resources'll 
£a = {(l^l,1^2) G f I G V4; Ga = {Va,£:a}; 

while (V is not empty) 
dc == none; 

for each (d^ G V) /'^Search for a candidate flow'll 

= Qa\{'^ I Sv <Rm} \ l^remove incapable nodes^l 
if (IsCoiin(^;™,v^,t;^) == true) 

dc = dm-, Qc = Q'^-, 

break; 

if (dc == none) candidate flow not found^l 
dc = RandSelect(P); 

Gc = Q\{v I S, < Rc}\ 
for each ('i; G V) I-node weight assignment^/ 
if {v G Va) Wy = InvCount(A§c5 ^m); 
else Wy = K{K — l)/2 + 1; 
for each ({vi,V 2 ) G £c) We = {wy^ ^Wyf)/2\ 

Vc = SPath(0c5 dc); /"^shortest path routing'll 

Va=VayJVc\V = V\dy^ 

for each {v G Vc) Sy = Sy — Rm\ 


carrying the flow, that is, when the flow is carried by the nodes, 
at least one dimension of the resource capacities of the nodes 
will be violated, leading to node congestion. We denote by 
the residual network after removing the incapable nodes and 
the links attached to these nodes. We then carry out function 
IsConn, a depth-first search procedure, to verify if the source 
and the destination of the current flow are connected in Gf^> 
If a candidate flow dc that can be routed on Gf^ is found, 
we stop the search procedure; otherwise we pick up a flow 
demand uniformly at random (function RandSelect) from the 
flow demand list. At this time, the residual capacity of the 
subnetwork formed by current active nodes is not sufficient 
for carrying any new flow, thus more nodes are needed to be 
activated so that routing demands for the newly selected flow 
can be satisfied. Once a candidate flow dc has been determined, 
we remove the incapable nodes (those that satisfy Sy < Ry 
which means that these exists at least one dimension k such 
that Sy{k) > Rc{k)) according to the resources demand of 
the candidate flow and we denote by Gc the resulted network. 
Then, we apply a weight assignment process where we assign 
weights to the active nodes in Va by invoking procedure 
InvCount (see below), and the weights for other nodes to 
be {K{K — l)/2 1). In order to facilitate path selection, 

we carry out a node-link transformation procedure to assign 
weights for links based on the weights for nodes. The design 
of node weight assignment and node-link transformation will 
be detailed later in this section. At last, the candidate flow 
will be routed by involving a shortest-path-based algorithm 
such as Dijkstra algorithm on the weighted network Gc and 
will be removed from the demands list. The above process is 
repeated until the route for every flow has been assigned. 


Inversion-based node weight assignment. We now de¬ 
scribe the function InvCount for assigning weights to network 
nodes. The second observation we mentioned at the beginning 
of this section suggests that once we have obtained a candidate 
flow to route on the subnetwork comprised of the active 
capable nodes, it is important to decide which nodes are 
preferable to carry the candidate flow. We provide a measure 
based on the distributions of the load vectors of both the node 
residual capacities and the flow demand. The general notion is 
that if the resource dimensions of a node are all kept balanced, 
then more flows will likely fit into the node. As a consequence, 
the number of nodes that need to be active will be reduced. 
To clarify, we first introduce the concept of inversion. 

Definition 2. Given two vectors X = and Y = 

cm inversion is defined as the condition Xi > Xj 
and yi < yj, 1 < j < n. 

Property 1. Given two vectors in n dimensions, the total 
number of inversions is upper bounded by n{n — l)/2. 

As we are focusing on the distributions of the node residual 
capacities and the flow resources demands, it is straightforward 
that an inversion can lead to much heavier resource dimensions 
imbalance on a node as the scarce resource is demanded 
more and the abundant resource is demanded less. Therefore, 
in order to keep all the dimensions of resources balanced, 
the number of inversions has to be minimized. Based on 
this principle, the inversion-based node weight assignment 
procedure assigns weights for nodes that are already active 
according to the number of inversions shared by the node 
residual capacity vector and the flow demand vector. The 
weights of the inactive nodes are set to be one unit larger 
than the maximum number of inversions that can be shared 
by any residual capacity vector and flow demand vector. As 
a result, if possible, the nodes that are active and with less 
numbers of inversions will be preferably chosen to carry the 
candidate flow and the inactive nodes have the lowest priority 
to be used. 

Path selection. The guideline for selecting the route for the 
candidate flow is to choose a path that connects the source 
and the destination of the candidate flow while minimizing 
the total weights of nodes that are on the path. This is 
actually equivalent to solving a node-weighted single-source 
shortest path routing problem. We notice that this problem can 
be transformed into a traditional link-weighted single-source 
shortest path routing problem by setting the weight of each 
link to be the half of the sum of the weights of the endpoints 
of this link. Denote by Mi and M 2 the node-weighted and 
the transformed link-weighted shortest path routing problems 
respectively. We have the following property. 

Property 2. Solving Mi is equivalent to soving M 2 . 

This is because as long as a flow is routed through an in¬ 
termediate node (nodes except the source and the destination) 
on a path, the weight on this node will be shared by two links 
(i.e., the ingress and egress links). Thus, if we let all the links 
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Algorithm 2 Hierarchical Green Routing (HGR) 

1: function VBP(P) I'^vector bin packing algorithm^/ 

2: idx = 1; dc = none; 5'idx = {0}^^ 

3: 5^/c=i 

4: while {V is not empty) 

5: Vc = V\{dm^V \ ^dx < Rm}\ 

6: dc = argmin^^^x^e - Rm{k)Y\ 

7: if (dc == none) idx++; l^open a new bin'll 

8: else P = P\ {dc}; l^pack the current item'll 

9 : return idx 

10: for (0 < i < z — 1) /*# of aggr. nodes in each pod'll 
11: = VBP({d^ I or in pod i}) 

12: for (0 < j < ^/2 — 1) /*# of core nodes^l 

13: = VBP({d„ I « or mod (^V4))/2 = j}) 


that are attached to an intermediate node to share half of the 
node weight, it is always true that the total weights on the links 
on this path will be equal to the total weights on the internal 
nodes on this path. As a result, solving the corresponding link- 
weighted single-source shortest path routing problem will also 
give solutions to the path selection for the candidate flow. It is 
well-known that the link-weighted single-source shortest path 
routing problem can be solved efficiently by using the Dijkstra 
algorithm. 

C. Time Complexity 

We now analyze the time complexity of the proposed MRG 
algorithm. The algorithm runs iteratively and in each iteration 
exactly one flow will be chosen as the candidate and will be 
routed. As a result, the maximum number of iterations will be 
upper bounded by the number of flow demands M. In each 
iteration, the algorithm first searches in the flow demand list 
to And out a candidate flow and the most time consuming part 
in this candidate flow searching procedure is depth-first search 
which can be accomplished in 0{\E\) time where \E\ < N‘^ 
is the total number of edges in the network (N is the total 
number of nodes). The total searching time in one iteration 
then will be in 0{M • \E\). Once the candidate flow is found, 
the time complexity will be dominated by the shortest path 
routing algorithm which can be done in 0{\E\ N • \ogN) 
time as the weight assignment procedure can be finished in 
time 0{N' K' log AT). Combining all these, we have that the 
MRG algorithm can be finished in 0{\E\M‘^) time. 

V. Topology-AWARE Heuristic 



Figure 2. An example to show the topology-aware heuristic in a fat-tree 
topology, where the set of active nodes are determined layer by layer. 


new topology-aware heuristic for the most common tree-like 
data center network topologies. 

The key observation we have from tree-like topologies is 
that the number of active nodes can be determined layer 
by layer. We take a typical fat-tree topology (as shown in 
Fig. as an example. The number of edge nodes cannot be 
optimized since edge nodes are also responsible for inter-host 
communication in the same rack. In each pod, the number 
of aggregation nodes can be determined according to the 
flow demands that flow out of and into the pod. This is 
actually to solve a vector bin packing problem as we have 
introduced previously. The core layer is a bit different from 
the aggregation layer; for a z-ary fat-tree, all the cores nodes 
that share congruence with respect to {z/2) will be responsible 
for carrying the flow demands from the aggregation nodes in 
the same positions in every pod. Thus for these core nodes, 
solving a vector bin packing can give the right number of 
nodes that need to stay active. Inspired by this observation, we 
propose HGR, a hierarchical energy-efficient routing algorithm 
based on solving a set of vector bin packing problems. The 
pseudocode of HGR is shown in Algorithm 

Vector bin packing. The function VBP we adopted for 
solving the vector bin packing problem is a norm-based 
greedy algorithm (281. The algorithm is bin-centric which 
means that it focuses on one bin idx and always places 
the most suitable remaining item that fits in the bin. To 
And out the most suitable item, the algorithm looks at the 
difference between the demand vector and the residual 
capacity vector 5'idx under a certain norm. We choose the ^ 2 - 
norm and from all unassigned items, we choose the item that 
minimizes <^/c(^idx(^) — Rm{k))‘^ where ak represents 

the importance of dimension k among all dimensions and is 
given by 


<T/e = 


^dmeT> Rmjk) 
S/c=l Rm{k) 


The proposed MRG algorithm can leverage the coordination 
of the flow demands in multiple dimensions and minimize the 
number of active network nodes efficiently. However, MRG is 
generally conducted without taking into account the topology 
features of the network. We notice that topologies of the 
networks commonly used in data center networks such as fat- 
tree or VL2 have very high level of symmetry and they are 
usually well structured in layers. Therefore, we argue that the 
routing algorithm can be further improved by taking advantage 
of the topology characteristics. In this section, we provide a 


If no item can be found to fit into the current bin idx, we open 
a new bin and repeat the above procedure. 

Time complexity. The HGR algorithm replies on solving 
several instances of the vector bin packing problem. In the 
worst case, the sizes of the vector bin packing instances can 
be as large as 0{M) and thus it will take 0{M‘^) time to be 
solved by VBP algorithm. As a result, the total time complexity 
of HGR can be given by O(M^). Compared to the MRG 
algorithm, HGR can provide a speedup of f^dAl). We will 
validate this speedup by simulations. 
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VI. Evaluation 

We carried out extensive simulations to evaluate the perfor¬ 
mance of the proposed algorithms. In this section, we provide 
a detailed summary of our simulation findings. 

A. Simulation Settings 

We deploy our algorithms on a laptop with a Core i5 2.6GHz 
CPU with two physical cores and 8GB DRAM. All of the 
algorithms are implemented in Python. 

We choose fat-trees of different sizes as the data center net¬ 
work topologies. This is because fat-tree is a typical topology 
used in DCNs, and can provide equal-length parallel paths 
between any pair of end hosts, which is very beneficial for 
software packet processing paradigm to embed processing 
functions into the routing paths regardless of the topology 
details. The fiow demands we used in our simulations are 
generated randomly: the endpoints of each fiow are chosen 
uniformly at random from the set of end hosts. The require¬ 
ment of each resource dimension of each fiow is generated 
following a normal distribution (in the positive side) where 
the mean and the variation are all set to be 0.02 to provide 
large resource demand diversity. The node capacity of each 
resource dimension is assumed to be normalized to 1. 

We carry out two groups of simulations for validating 
MRG and HGR respectively: i) For evaluating the perfor¬ 
mance of algorithm MRG, we compare it with three other 
algorithms of interest: Single-Resource Shortest Path (SRSP), 
Single-Resource Green (SRG), Multi-Resource Shortest Path 
(MRSP). The efficiency of energy saving of the four algo¬ 
rithms are examined on two fat-tree topologies in different 
scales under different numbers of fiow demands. We also 
explore the impact of the number of resource dimensions under 
certain scenarios, ii) The performance of algorithm HGR is 
compared with that of algorithm MRG. We first study the 
impact of the number of resource dimensions. Then, under 
certain scenarios, we examine the efficiency of energy saving 
and the running time of both MRG and HGR under different 
numbers of fiow demands. All the results are averaged among 
20 independent tests and all the figures show with the average 
and the standard deviation. 


B. Performance of Algorithm MRG 

Energy savings. The simulation results for evaluating the 
energy saving performance of MRG are depicted in Fig. [^a, 
b, c). The energy saving ratio is represented by the number 
of inactive nodes divided by the total number of nodes. It can 
be seen from Fig. 3(a) that MRG outperforms the other three 
algorithms with respect to energy savings under all scenarios. 
SRSP and MRSP converge to very low energy saving ratios 
very quickly while SRG and MRG can exploit more energy 
saving potentials by carefully steering traffic fiows. We also 
compare the performance of all the algorithms under extremely 
heavy load scenarios. When the number of fiow demands 
exceeds the capability of the network (and congestion happens 
at some critical nodes), MRSP and MRG will block more 
fiows than SRSP and SRG as can be seen in Fig. |3(b)| This 


1^ SRSP ^ SRG CX3 MRSP ^ MRG| SRSP ^ SRG [ZZ] MRSP ^ MRG| 




(a) Energy savings under different (b) Number of incomplete flows 
numbers of flows 



(c) Number of congested nodes (d) Energy savings under different 

numbers of resource dimensions 

Eigure 3. Performance comparison for MRG under the scenarios where the 
network topology is given by an 8-ary fat-tree with 208 nodes (128 end-hosts 
and 80 packet processors). 


is reasonable because MRSP and MRG take into account 
more resource dimensions and it is likely that node capacities 
are violated more easily than with single-resource solutions. 
However, when considering only one resource dimension, 
some nodes will be congested due to the neglect of other 
resource dimensions, although more fiow demands are likely 
to be assigned. The numbers of nodes that are congested under 


different numbers of fiows are shown in Fig. 3(c) 


Impact of the number of resource dimensions. Fig. 3(d) 


depicts the simulation results for examining scalability of 
MRG with respect to the number of resource dimensions. It 
can be obviously noticed that the energy saving performance 
of MRG has a very significant improvement with the increase 
of the number of resource dimensions and converges to a 
high level. This is because with more resource dimensions, 
the proposed inversion-based node weight assignment can 
distinguish nodes from one another more accurately and thus 
the path chosen for each fiow will be more effective in terms 
of energy saving. 


C. Performance of Algorithm HGR 

We first compare the scalability of MRG and HGR with 
respect to the number of resource dimensions. The simulation 
results are shown in Fig. |4(a) It can be observed that HGR 
outperforms MRG when the number of resource dimensions 
is very small. However, with the increase of the number of 
resource dimensions, the energy saving performance of HGR 
drops dramatically with a constant rate, while MRG performs 
better and better and converges finally as we have discussed 
before. This is mainly because HGR is largely based on the 






































































(a) Energy savings under different (b) Energy savings under different 
numbers of resource dimension numbers of flows 

Eigure 4. Performance comparison for HGR under the scenarios where the 
network topology is given by an 8-ary fat-tree with 208 nodes (128 end-hosts 
and 80 packet processors). 

Table I 

Running time Statistics of the Algorithms (Unit: secs) 


# of flows 

20 

40 

60 

80 

100 

120 

alg. MRG 

5.37 

16.63 

37.00 

58.26 

92.63 

101.89 

alg. HGR 

0.026 

0.078 

0.192 

0.400 

0.647 

0.681 


vector bin packing heuristic which performs well when the 
number of dimensions is small due to the greedy manner of 
item assignment, but it has very poor scalability with respect 
to the number of dimensions. 

We then choose a fair number of resource dimensions 
(K = 3) and compare both the energy saving ratio and the 
running time of MRG and HGR. The energy saving results are 
depicted in Fig. 4(b)[ We observe that when the number of flow 
demands is not very large, MRG and HGR are comparable 
in terms of energy savings, but HGR suffers from some 
performance degradation when the number of flows is very 
large. However, HGR compensates this slight loss of energy 
efficiency by a very signiflcant reduction on the running time. 
As can be seen from Table for a fat-tree with 80 packet 
processing nodes (i.e., \E\ = 192), the running time of HGR 
is around 0.5 percent of that of MRG, which confirms the 
lower bound on the speedup (1(|T^|). 


VII. Discussion 

We discuss in this section some practical issues that are 
related to the application of the proposed technique. 


A. Model Extension 

Dynamic flow joining and leaving. The problem we have 
discussed is for scenarios where a static set of flow demands 
is given a priori and the proposed MRG algorithm is dedicated 
to solving this problem. However, the reality differs from the 
static case by having flows joining and leaving the network 
dynamically. We observe that although we did not take into 
account the dynamic property of the set of flow demands, 
the MRG algorithm can be extended to the online case due 
to its progressive fashion. When a new flow arrives in the 
network, we first check whether the subnetwork formed by 
only the active nodes is capable of carrying this flow. If it 
is true, we carry out the node-weight assignment and path 
selection procedures to And out a routing path and route the 


flow using this path. Otherwise, we include also the inactive 
nodes with weights assigned and And out a path in the resulting 
network. When a flow completes its transmission, we focus on 
two types of flows: for the existing flows that have very short 
life-times, we leave them as they are in the network as they 
will be completed in a short time; for the long-lived flows 
in the network, we buffer the newly arrived flows until the 
existing short-lived flows are gone and we carry out the MRG 
algorithm to reroute those flows that have long life-times in 
order to achieve energy efficiency. After this rerouting, the 
routes for the buffered new flows will be assigned as well. The 
node-weight assignment and path selection procedures have 
low complexity. Thus they can be applied to online scenarios 
conveniently. Nevertheless, the centralized environment also 
enables parallel acceleration to ensure realtime optimization. 

Heterogeneity. For the sake of tractability, we assumed in 
our model that the resources required for processing all the 
packets from a given flow on every node of the flow’s routing 
path are very related to the size of the packets, and thus homo¬ 
geneity can be assumed. However, this may not be true when a 
flow requires different processing functions on different nodes. 
Our model can be extended to the heterogeneous case by 
treating each node on the routing path independently, e.g., 
in the MRG algorithm, the weights on nodes can be assigned 
in a hop by hop manner; in the HGR algorithm, each vector 
bin packing instance is solved by having different resource 
requirements from the flow demands. We leave more elaborate 
solutions for future work. 

B. Practical Application Scenarios 

Named data networking. To generalize the role of thin 
waist of the IP architecture. Named Data Networking (NDN) 
was proposed, where packets can name objects other than 
communication endpoints. NDN routes and forwards packets 
based on names, which requires high-performance processing 
(e.g., preflx matching) capability at network nodes. Moreover, 
it is also useful for a network node to cache the received 
data packets in its content store and use them to satisfy future 
requests. These properties make NDN a good application sce¬ 
nario for the NaaS model. As a result, the proposed GreenNaaS 
solution will have the potential to be used for achieving energy 
efficiency in NDN. 

Server-centric data center network architecture. In tra¬ 
ditional switch-centric networks, data packets are transmitted 
through only proprietary network devices such as switches or 
routes. With the possibility of having multi-port network cards 
on servers, several server-centric network architectures such as 
BCube 1291, SWCube and SWKautz |30l have been recently 
proposed for data center networks. In these server-centric 
network architectures, servers are also involved in packet 
forwarding. By applying more application-speciflc processing 
functions on packets, these architectures are very easy to be 
extended to adopt the NaaS model and thus GreenNaas can 
be used to save energy in those server-centric data center 
networks where energy issue is more prominent than in 
traditional switch-centric data center networks. 
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Middlebox orchestration. Middleboxes are special network 
devices that are responsible for packet processing to provide 
functionality such as NATs, firewalls or WAN optimizers. 
Traditionally, these devices are proprietary and have been 
implemented on closed hardware platforms which are usually 
hard to be extended. To change this situation, recently many 
proposals have been provided to make middlebox function¬ 
alities software-centric (e.g., ED, (321). NaaS incorporates 
these by leveraging application-specific in-network packet 
processing. Although there are still many critical problems, 
such as service chain design, needing more research efforts, 
GreenNaaS provides a clear insight on how to achieve energy 
efficiency in NaaS systems which is definitely an important 
issue in the near future. 

VIII. Conclusion 

We study the energy-efficiency multi-resource routing prob¬ 
lem which arises from the recently proposed cloud networking 
model NaaS. This optimization problem differs from the 
traditional energy-efficient routing problem by having node 
capacities and fiow demands represented by vectors in multiple 
dimensions. We provide a simple iterative routing scheme 
which selects fiows iteratively to exhaust the residual capac¬ 
ities in active nodes and assign routes to fiows based on the 
distributions of node residual capacities and fiow demands. 
To leverage the structural property of data center network 
topologies, we also provide a topology-aware heuristic desig¬ 
nated to fat-trees, which can provide comparably good energy 
efficiency while significantly reducing the computation time. 
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